Geospatial Data Analysis in R

author: Timothy H. Keitt date: May 12, 2014 width: 1440 height: 900

Instructors

type: section

Tim Keitt <tkeitt@utexas.edu>

Daniel Mitchell <daniel.lewis.mitchell@gmail.com>

Tim’s research

I study the spatial and temporal organization of populations, communities and ecosystems.

More at http://www.keittlab.org/

Tim’s research

Jaguar landscape

A little history

A little history

A little history

A little history

New developments

Why R?

type: section

Why R?

Why R?

CRAN package growth


Why R?


reproducible research book

Caveat emptor!

Why R for GIS?

type: section

Why R for GIS?

Why R for GIS?

mesh grid

A quick example

library(sp)
data(meuse)
coordinates(meuse) <- ~x+y # coerce to SpatialPointsDataFrame
library(lattice) # spplot is based on the lattice package
trellis.par.set(sp.theme()) # load some custom colors into lattice graphics

A quick example

spplot(meuse, "dist")

plot of chunk unnamed-chunk-2

Another example

library(sp)
library(maptools) # used here to read shapefiles
orig.wd = getwd()
setwd("example-data/NCEAS sample")
states <- readShapePoly("western-states")
reservoirs <- readShapePoly("western-reservoirs")
rivers <- readShapeLines("western-rivers")
dams <- readShapePoints("western-dams")
setwd(orig.wd)

Another example

plot.it = function()
{
  plot(states, border = "wheat3", col = "wheat1")
  lines(rivers, col = "dodgerblue3")
  plot(reservoirs, col = "darkgreen", border = "darkgreen", add = TRUE)
  points(dams, cex = 1.4, col = "darkred")
  text(dams, labels = as.character(dams$DAM_NAME), col = "darkred", cex = 0.6, font = 2, offset = 0.5, adj = c(0,2))
  title("Major Dams of the Western United States")
  legend("bottomleft", legend = c("Major rivers", "Reservoirs", "Dams"), title = "Legend", bty = "n", inset = 0.05, lty = c( 1,-1,-1), pch = c(-1,15, 1), col = c("dodgerblue3", "darkgreen", "darkred"))
}

Another example

plot of chunk unnamed-chunk-5

Geospatial data concepts

type: section

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Topologies - Networks - Accuracy and precision - Map projections - Spatial indices

Types of geospatial data

Vector data - Points - Lines - Polygons


Types of geospatial data

Raster data - Spatial grids - Lookup tables


raster

Types of geospatial data

Topology - Areal data - Vertices - Faces


topology

Types of geospatial data

Network - Relational - Vertices - Edges - Planar network - Topology is a special case


network

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

Geospatial data concepts

OGC Simple Feature Hierarchy

Geospatial data concepts

OGC Simple Feature Hierarchy

Geospatial data concepts

OGC Simple Feature Predicates

Geospatial data concepts

OGC Simple Feature Set Operations

Handled by GEOS library; bindings in rgeos package

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

Geospatial data concepts

Raster data

Location is implicit using row and column offsets

Geospatial data concepts

Raster data


Geospatial data concepts

Raster data

For geospatially registered data, two coordinate systems - Row and column offsets - Geospatial coordinates

For many spatial reference systems, conversion from map coordinates to raster offsets can be achieved via an affine transform.

\[ \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x_0 \\ y_0 \end{bmatrix} + \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} c \\ r \end{bmatrix} \]

\[ \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}^{-1} \left( \begin{bmatrix} x \\ y \end{bmatrix} - \begin{bmatrix} x_0 \\ y_0 \end{bmatrix} \right) = \begin{bmatrix} c \\ r \end{bmatrix} \]

If \(a_{11}\) or \(a_{22}\) are negative, then axis is “flipped”.

Geospatial data concepts

Affine transforms

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

Geospatial data concepts

Accuracy and precision


Geospatial data concepts

Accuracy and precision


Geospatial data concepts

Resampling


Geospatial data concepts

Resampling methods


Geospatial data concepts

Accuracy and precision


Geospatial data concepts

Accuracy and precision

Geospatial data concepts

Complex versus simple features

Sources of imprecision - Measurement, recording and transcription errors - Have to clean the data - “Snapping” to reduced precision coordinates sometimes works - Numerical imprecision - Floating point round-off and other effects - Snapping may help - Infinite precision arithmetic possible


Geospatial data concepts

Complex versus simple features

Most spatial operators in R require simple features

Geospatial data concepts

Complex versus simple features

Most spatial operators in R require simple features

Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

Geospatial data concepts

Map projections


Geospatial data concepts

Map projections

Latitude and longitude are natural coordinates

Geospatial data concepts

Map projections

Nonetheless they are still relative to a model of earth

Geospatial data concepts

Map projections


Geospatial data concepts

Map projections


Geospatial data concepts

Map projections

Geospatial data concepts

Map projections


Geospatial data concepts

Example systems


Geospatial data concepts

Example coordinate systems


Geospatial data concepts

Example coordinate systems


Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

Geospatial data concepts

Map queries


Geospatial data concepts

Map queries


Geospatial data concepts

Map queries


Geospatial data concepts

type: sub-section - Types of geospatial data - Vectors and simple features - Raster data - Accuracy and precision - Map projections - Spatial indices

RefresheR

type: section

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

R interprets expressions

x = 3; print(x)
[1] 3
y = 2 * x + 4; print(y)
[1] 10

RefresheR

Getting help

help(ls)
?ls
??predict
help(package = stats)

Try googling “<topic> in R”

RefresheR

Session environment

library(nlme)    # attach package
getwd()          # where am I?
setwd("my.dir")  # go there
ls()             # list R objects
dir()            # list files
q()              # all done

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Basic IO

x = read.csv('the-data.csv')  # 90% of what I do
x = read.table('other-data.asc', header = TRUE, as.is = TRUE)
write.csv(object, file = 'output.csv')

These return data frames. More on that in a bit.

RefresheR

Database access

library(RPostgreSQL)
drv = dbDriver("PostgreSQL")
con = dbConnect(drv, dbname = "testing")
res = dbSendQuery(con, "select length from coastlines")
dframe = fetch(res)

Enables powerful SQL queries to RDMS. See also the sqldf package.

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Assignment

a = 1
b <- 2
3 -> c
print(a); print(b); print(c)
[1] 1
[1] 2
[1] 3

I will use = and not <- or ->. R does not care.

RefresheR

Control flow (compact)

if ( TRUE ) print("yes") else print("no")
[1] "yes"
z = rep(1, 10); print(z)
 [1] 1 1 1 1 1 1 1 1 1 1
for ( i in 3:10 ) z[i] = z[i-1] + z[i - 2]; print(z)
 [1]  1  1  2  3  5  8 13 21 34 55

if and for are sufficient for the vast majority of programs

RefresheR

Control flow (better layout)

i = 7
while ( i )
{
  z[i] = z[i] / 2
  i = i - 1
  if ( i < 3 ) break
}
print(z)  # divide elements 4:7 by 2
 [1]  1.0  1.0  1.0  1.5  2.5  4.0  6.5 21.0 34.0 55.0

while is less common but useful in cases with an indeterminate number of loops

RefresheR

Control flow (better layout)

i = 7
while ( i )
{
  z[i] = z[i] / 2
  i = i - 1
  if ( i < 3 ) break
}

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

R expressions are vectorized

x = 1:5; print(x)
[1] 1 2 3 4 5
y = 2 * x; print(y)
[1]  2  4  6  8 10

RefresheR

The rule is that the expression is evaluated for each element

z = 1:5; print(2 * z)
[1]  2  4  6  8 10
for ( i in 1:5 )
{
  z[i] = 2 * z[i]  
}
print(z)
[1]  2  4  6  8 10

RefresheR

Functions may or may not return a value for each element of an input vector

print(sqrt(1:5))         # a 'map'
[1] 1.000 1.414 1.732 2.000 2.236
print(sum(1:5))          # a 'reduce'
[1] 15
print(summary(1:5))      # a more complex 'reduce'
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       2       3       3       4       5 

Can be tricky in complex code

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

2-D indices are nested

a = matrix(1:9, 3)
print(a)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
print(a[1:2, 3:2])
     [,1] [,2]
[1,]    7    4
[2,]    8    5

RefresheR

2-D indices are nested

b = matrix(NA, 2, 2)
ri = 1:2; ci = 3:2
for ( i in seq(along = ri) )   # for each index in ri
  for ( j in seq(along = ci) ) # loop over indices in ci
    b[i, j] = a[ri[i], ci[j]]  # ith ri and jth ci
print(b)
     [,1] [,2]
[1,]    7    4
[2,]    8    5
print(a[ri, ci])    # equivalent expression
     [,1] [,2]
[1,]    7    4
[2,]    8    5

You can omit braces for a single loop expression (not preferred)

RefresheR

Rule is if no comma, then use each element of index

print(a)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
print(a[1:5]) # 1-D index gives 1-D result
[1] 1 2 3 4 5

Matrices are stored column-wise

RefresheR

Index can be higher dimensional

i = which(diag(3) == 1, arr.ind = TRUE); print(i)
     row col
[1,]   1   1
[2,]   2   2
[3,]   3   3
print(diag(a[i])) # a[i[1,]], a[i[2,]]...
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    5    0
[3,]    0    0    9
print(a[i[, 1], i[, 2]]) # a[i[1, 1], i[1, 2]]...
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

RefresheR

Note the difference

print(a[i])
[1] 1 5 9
print(a[i[, 1], i[, 2]])
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Multiple indices separated by commas are nested

RefresheR

Using indices creatively is often the fastest way to extract and rearrange data in R

i = rep(1:3, each = 2); print(i)
[1] 1 1 2 2 3 3
print(a[i, i])
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    4    4    7    7
[2,]    1    1    4    4    7    7
[3,]    2    2    5    5    8    8
[4,]    2    2    5    5    8    8
[5,]    3    3    6    6    9    9
[6,]    3    3    6    6    9    9

RefresheR

Using indices creatively is often the fastest way to extract and rearrange data in R

i = 1:3; print(sample(i))
[1] 2 1 3
print(a[sample(i), sample(i)]) # row-column permutation
     [,1] [,2] [,3]
[1,]    2    5    8
[2,]    3    6    9
[3,]    1    4    7

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Functions

f = function(a, b = 2, c = NULL)
{
  res = a * b
  if ( ! is.null(c) ) res = res * c
  return(res)
}
print(f(1, 2, 3)); print(f(4))
[1] 6
[1] 8

Using NULL as a flag facilitates reuse

RefresheR

Functions are objects

print(class(f))
[1] "function"
print(formals(f))
$a


$b
[1] 2

$c
NULL

RefresheR

Functions are objects

print(body(f))
{
    res = a * b
    if (!is.null(c)) 
        res = res * c
    return(res)
}
body(f) = "gotcha"
print(f(6))
[1] "gotcha"

RefresheR

Scope

c = "mice"                # global
f = function(a = "three") # function formals
{
  b = "blind"             # function body
  return(paste(a, b, c))  # three scopes
}
print(f())
[1] "three blind mice"

Object not found in current scope initiates search upward into enclosing scopes

RefresheR

Scope

x = 2                      # global scope
f = function() x = 2 * x   # 2 different x's here
print(x); print(f()); print(x)
[1] 2
[1] 4
[1] 2

The assignment in the function body creates a variable x whose scope is the function body

RefresheR

Useful for closures

f = function()
{
  x = sum(rnorm(100))    # 1-time stuff here 
  function(y) x * y      # return a function
}
g = f()                  # closure factory
print(c(g(2), f()(2)))   # uses x created by f()
[1] -30.34  21.47

The factory function is just a convenient way to bind an anonymous environment to the returned closure. I use this all the time to speed up calculations.

RefresheR

Function objects and closures are the key to functional programming

Not functional

x = matrix(rnorm(25), 5)
row.sums = rep(NA, 5)
for ( i in 1:5 ) row.sums[i] = sum(x[i, ])
print(row.sums)
[1]  0.6581 -3.5508  3.7498  1.2681 -0.3079

RefresheR

Function objects and closures are the key to functional programming

Functional

f = function(i) sum(x[i, ]) # x global scope
row.sums = sapply(1:5, f)   # f(1), f(2)...
print(row.sums)
[1]  0.6581 -3.5508  3.7498  1.2681 -0.3079

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Arrays, including vectors and matrices, hold one type; lists hold different types

# coerced to character
print(c(1, 'a', TRUE))
[1] "1"    "a"    "TRUE"
# retain individual types
print(list(1, 'a', TRUE)) 
[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]
[1] TRUE

RefresheR

Single v. double braces

x = as.list(1:3)
print(x[2]); print(class(x[2]))      # returns list
[[1]]
[1] 2
[1] "list"
print(x[[2]]); print(class(x[[2]]))  # returns list element
[1] 2
[1] "integer"

RefresheR

Data frames are lists of vectors

a = 1:3
b = factor(1:3)
c = letters[1:3]
x = data.frame(a = a, b = b, c = c)
print(x)
  a b c
1 1 1 a
2 2 2 b
3 3 3 c
print(names(x))
[1] "a" "b" "c"

RefresheR

Use list operators to extract columns

print(x$a)
[1] 1 2 3
print(x[[2]])    # a vector of factors
[1] 1 2 3
Levels: 1 2 3
print(x[['c']])  # different factors
[1] a b c
Levels: a b c

RefresheR

Or matrix or vector indexing

print(x[1:2, 2:3])
  b c
1 1 a
2 2 b
print(x[2])
  b
1 1
2 2
3 3

RefresheR

Subsetting

y = subset(x, b %in% 2:3, select = c(b, c))
print(y)
  b c
2 2 b
3 3 c

Lots of new fancy packages for manipulating data frames. See reshape, plyr, dplyr, sqldf and others.

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Raison d’être of R is modeling

x = rnorm(100)
y = 1 + 2 * x + rnorm(100) 
plot(y ~ x)

plot of chunk unnamed-chunk-39

RefresheR

Raison d’être of R is modeling

mod1 = lm(y ~ x)  # y = b0 + b1 * x
summary(mod1)

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5983 -0.7540 -0.0803  0.6819  2.4079 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    1.054      0.103    10.2   <2e-16 ***
x              2.075      0.106    19.5   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.03 on 98 degrees of freedom
Multiple R-squared:  0.795, Adjusted R-squared:  0.793 
F-statistic:  381 on 1 and 98 DF,  p-value: <2e-16

RefresheR

S3 classes and methods

print(class(mod1))            # S3 class
[1] "lm"
methods(class = class(mod1))  # S3 methods
 [1] add1.lm*           alias.lm*          anova.lm*         
 [4] case.names.lm*     confint.lm         cooks.distance.lm*
 [7] deviance.lm*       dfbeta.lm*         dfbetas.lm*       
[10] drop1.lm*          dummy.coef.lm      effects.lm*       
[13] extractAIC.lm*     family.lm*         formula.lm*       
[16] hatvalues.lm*      influence.lm*      kappa.lm          
[19] labels.lm*         logLik.lm*         model.frame.lm*   
[22] model.matrix.lm    nobs.lm*           plot.lm*          
[25] predict.lm         print.lm*          proj.lm*          
[28] qqnorm.lm*         qr.lm*             residuals.lm      
[31] rstandard.lm*      rstudent.lm*       simulate.lm*      
[34] summary.lm         variable.names.lm* vcov.lm*          

   Non-visible functions are asterisked

RefresheR

Calling a generic method

plot(mod1)

plot of chunk unnamed-chunk-42 plot of chunk unnamed-chunk-42 plot of chunk unnamed-chunk-42 plot of chunk unnamed-chunk-42

RefresheR

Raison d’être of R is modeling

anova(mod1)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value Pr(>F)    
x          1    404     404     381 <2e-16 ***
Residuals 98    104       1                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model with x is much better

RefresheR

Model syntax

Y ~ X          # Y = B0 + B1 * X
Y ~ 0 + X      # Y = B1 * X
Y ~ X1 + X2    # Y = B0 + B1 * X1 + B2 * X2
Y ~ X1 * X2    # Y = B0 + B1 * X1 + B2 * X2 + B3 * X1 * X2
Y ~ I(X / 2)   # Y = B0 + B1 * (X / 2)

RefresheR

Peaking under the hood

str(mod1[1:6])
List of 6
 $ coefficients : Named num [1:2] 1.05 2.08
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
 $ residuals    : Named num [1:100] -0.243 -0.283 -0.848 0.539 1.863 ...
  ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
 $ effects      : Named num [1:100] -11.892 -20.09 -0.804 0.554 1.912 ...
  ..- attr(*, "names")= chr [1:100] "(Intercept)" "x" "" "" ...
 $ rank         : int 2
 $ fitted.values: Named num [1:100] 0.767 2.011 2.875 0.578 3.303 ...
  ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
 $ assign       : int [1:2] 0 1

- S3 objects are usually lists with a class attribute - str can be helpful with “what the heck is that?”

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Mixed effects with S4 classes \[Y = X \beta + Zu + \epsilon\]

library(lme4)
x = rnorm(100)
z = rbinom(100, 1, 0.5)
y = 1 + 2 * x - 3 * z + rnorm(100)

RefresheR

Mixed effects with S4 classes \[Y = X \beta + Zu + \epsilon\]

library(lattice)
xyplot(y ~ x | z, cex = 3)       # cex is graphical parameter

plot of chunk unnamed-chunk-47

RefresheR

Mixed effects with S4 classes \[Y = X \beta + Zu + \epsilon\]

mod2 = lmer(y ~ x + (1 | z))  # z is random intercept
print(mod2)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (1 | z)
REML criterion at convergence: 281.9
Random effects:
 Groups   Name        Std.Dev.
 z        (Intercept) 2.007   
 Residual             0.946   
Number of obs: 100, groups:  z, 2
Fixed Effects:
(Intercept)            x  
     -0.436        2.214  

RefresheR

What is mod2?

class(mod2)
[1] "lmerMod"
attr(,"package")
[1] "lme4"
isS4(mod2)
[1] TRUE

RefresheR

A generic method applied to S4 object

ranef(mod2)
$z
  (Intercept)
0       1.416
1      -1.416

RefresheR

S4 object have slots

slotNames(mod2)
 [1] "resp"    "Gp"      "call"    "frame"   "flist"   "cnms"    "lower"  
 [8] "theta"   "beta"    "u"       "devcomp" "pp"      "optinfo"
show(mod2@beta)  # show is S4 print
[1] -0.4365  2.2141

Fixed effects stored in beta

RefresheR

Access S4 slots with @ or slot method

head(mod2@frame, n = 3)  # the data
       y       x z
1  2.808  0.9571 0
2 -3.441 -1.4161 0
3 -4.325 -2.7417 0
head(slot(mod2, 'frame'), n = 3)
       y       x z
1  2.808  0.9571 0
2 -3.441 -1.4161 0
3 -4.325 -2.7417 0

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

Iterators increment or decrement on each call rather than existing as a vector of values.

library(iterators)
i = iter(1:3)
c(nextElem(i), nextElem(i), nextElem(i))
[1] 1 2 3

Iterators can be convenient but really shine when they allow a computation to proceed incrementally without storing the entire iterator sequence in memory.

RefresheR

foreach evaluates an expression for each value of an iterator sequence

library(foreach)
foreach(i = 1:3) %do% rnorm(1)
[[1]]
[1] -0.3558

[[2]]
[1] 0.7485

[[3]]
[1] -2.969

The real action is in the %do% infix operator.

RefresheR

The .combine argument gives more control

library(foreach)
foreach(i = 1:3, .combine = c) %do% rnorm(1)
[1]  1.2698  0.1409 -1.5221

This is a form of map (the expression) and reduce (the .combine function)

RefresheR

Something a bit more challenging

f = function(n = 1e7) prod(rnorm(n))
system.time(f())
   user  system elapsed 
  1.097   0.026   1.123 

About 1.5 seconds on my laptop

RefresheR

Something a bit more challenging

system.time(
  foreach(i = 1:5, .combine = prod) %do% f()
)
   user  system elapsed 
  5.028   0.056   5.084 

About 6.5 seconds on my laptop

RefresheR

Now the really cool part

library(doMC)   # multicore extensions
registerDoMC()  # create parallel engine
system.time(
  foreach(i = 1:5, .combine = prod) %dopar% f()
)
   user  system elapsed 
  0.014   0.031   1.427 

About 2.5 seconds on my laptop. Here %dopar% automatically splits the computation across all of the available cores.

Note: this may crash RStudio under some circumstances. Better to run %dopar% on the command line.

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops

RefresheR

R does linear algebra

x = 1:3       # [1, 2, 3]
x %*% x       # inner product
     [,1]
[1,]   14
x %*% t(x)    # outer product
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    2    4    6
[3,]    3    6    9

RefresheR

R does linear algebra

a = matrix(1:9, 3); print(a)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
a %*% x      # matrix - vector product
     [,1]
[1,]   30
[2,]   36
[3,]   42

RefresheR

R does linear algebra

a %*% a      # matrix - matrix product
     [,1] [,2] [,3]
[1,]   30   66  102
[2,]   36   81  126
[3,]   42   96  150
eigen(a)     # eigenvalue decomposition
$values
[1]  1.612e+01 -1.117e+00 -5.701e-16

$vectors
        [,1]    [,2]    [,3]
[1,] -0.4645 -0.8829  0.4082
[2,] -0.5708 -0.2395 -0.8165
[3,] -0.6770  0.4039  0.4082

RefresheR

R does linear algebra

RefresheR

type: sub-section - Getting started - Getting data in and out - Basic syntax and control flow - Vectorized expressions - Array indices - Functions and functional programming - Lists and data frames - Model syntax and S3 methods - Model syntax and S4 methods - Iterators and foreach - Matrix-vector ops